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Title : METHOD FOR CONSTRUCTING A LIBRARY USING DNA SHUFFLING 



FIELD OF THE INVENTION 

The present invention relates to optimizing DNA sequence 
in order to (a) improve the properties of a protein of interes 
by artificial generation of genetic diversity of genes encodir. 
proteins having a biological activity of interest by the use c 
the so-called gene- or DNA shuffling technique to create 
large library of "genes", expressing said library of genes in 
suitable expression system and screening the expressed protein 
in respect of specific characteristics to determine such pro- 
teins exhibiting desired properties or (b) improve the proper- 
ties of regulatory elements such as promoters or terminators b** 
generation of a library of these elements, transforming suit- 
able hosts therewith in operable conjunction with a structural 
gene, expressing said structural gene and screening for desir- 
able properties in the regulatory element. 

BACKGROUND OF THE INVENTION 

It is generally found that a protein performing a certain 
bioactivity exhibits a certain variation between genera and 
even between members of the same species differences may exist. 
This variation is of course even more outspoken at the genomic 
level . 

This natural genetic diversity among genes coding for 
proteins having basically the same bioactivity has been gener- 
ated in Nature over billicr.s of years and reflects a natural 
optimization of the proteins coded for in respect of the envi- 
ronment of the organism in question. 
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in today's society the conditions of life are vastly re- 
eved fro, the natural environment and it has been found that 
naturally occurring bioactive molecules are not optimizes 
for the various uses to which they are put by mankind, espe- 
cially when they are used for industrial purposes. 

It has therefore been of interest to industry to identiry 

, ^hat- exhibit optimal properties in re- 

such bioactive proteins that exniDit u H 

spect of the use to which it is intended. 

This has for many years ■ been done by screening of natura. 
sources, or by use of mutagenesis. For instance, within t.,e 
technical field of enzy.es for- use in e.g. detergents, the 
.ashing and/or dishwashing performance of e.g. naturally occur- 
rin g oroteases, lipases, amylases and cellulases have been im- 
proved' significantly , by in vitro modifications of the enzymes. 

in most cases these improvements have been obtained oy 
sl te-directed mutagenesis resulting in substitution, deletion 
o. insertion of specific amino acid residues which have been 
chosen either on the basis of their type or on the basis or 
t- .ir location in the secondary or tertiary structure of the 
nature enzyme (see for instance US patent no. 4,S18,o34). 

nf novel Dolyoeotide vari- 
Tn this manner the preparation o- novei p y. . 

a~ts and mutants, such as novel modified enzymes with altered 

characteristics, e.g. specific activity, substrate specificity, 

th-mal, oH and salt stability, pH-optimum, pi, K*. V max etc., 

has successfully been performed to obtain polypeptides with im- 

oroved properties. 

For instance, within the technical field of enzymes tn 
washing and/or dishwashing performance of e.g. proteases, 1. 
pases, amylases and cellulases have been improved signifi 
cantly. 
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An alternative general approach for modifying proteins 
and enzymes has been based on random mutagenesis, for instance, 
as disclosed in US 4,894,331 and WO 93/01285 

As it is a cumbersome and time consuming process to ob- 
tain polypeptide variants or mutants with improved functional 
properties a few alternative methods for rapid preparation cf 
modified polypeptides have been suggested. 

Weber et al., (1983), Nucleic Acids Research, vol. 11, 
5661-5661, describes a method for modifying genes by in vivs 
recombination between two homologous genes. A linear DNA se- 
quence comprising a plasmid vector flanked by a DNA sequence 
encoding alpha-1 human interferon in the 5 '-end and a DNA se- 
quence e^drnF^PhV-2~h^marriTterfiroh Tn 'the T' - en3 "is con- 
structed and transfected into a rec A positive strain of E. 
coli. Recombinants were identified and isolated using a resis- 
tance marker. 

Pompon et al., (198S), Gene 83, p. 15-24, describes a 
method for shuffling gene domains of mammalian cytochrome P-450 
by in vivo recombination of partially homologous sequences in 

Saccharomyces cerevisiae by transforming Saccharomyces cere- 
visiae with a linearized plasmid with filled-in ends, and a DN.- 
fracment being partially homologous to the ends of said plas- 



mic . 

I 



In WO 97/07205 a method is described whereby polypeptide 
variants are prepared by shuffling different nucleotide se 
quences of homologous DNA sequences by in vivo recombinatic 
using plasmid DNA as template. 

US patent no. 5,093,257 (Assignee: Genencor Int. Inc. 
discloses a method for procuring hybrid polypeptides by in viv 
recombination. Hybrid DNA sequences are produced by forming 
circular vector comprising a replication sequence, a first C> 
sequence encoding the air.ir.o- terminal portion of the hybrid p: 
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lypeptide, a second DNA sequence encoding the carboxy-terminal 
portion of said hybrid polypeptide. The circular vector is 
transformed into a rec positive microorganism in which the cir- 
cular vector is amplified. This results in recombination cf 
said circular vector mediated by the naturally occurring recom- 
bination mechanism of the rec positive microorganism, which in- 
clude prokaryotes such as Bacillus and E. coli, and eukaryotes 
such as Saccharomyces ceravisiae. 

One method for the shuffling of homologous DNA sequences 
has been described by Stemmer (Stemmer, (1994), Proc. Natl. 
Acad. Sci. USA, Vol. 91, 10747-10751; Stemmer, (1994), Nature, 
vol. 370, 389- 391). The method concerns shuffling homologous 
DNA sequences by using in vitro PCR techniques. Positive recom- 
binant genes containing shuffled DNA sequences are selected 
from a DNA library based or. the improved function of the ex- 
pressed proteins. 

The above method is also described in WO 95/22625. KO 
95/22625 relates to a method for shuffling of homologous DNA 
sequences. An important step in the method described in WO 
95/22625 is to cleave the homologous template double-stranded 
polynucleotide into random fragments of a desired size followed 
by homologously reassembling of the fragments into full-length 
gene s . 

A disadvantage inherent to the method of WO 95/22625 is, 
however, that the diversity generated through that method is 
limited due to the use of homologous gene sequences (as defined 

in WO 95/22625) . 

Another disadvantage in the method of WO 95/22625 lies ir. 
the production of the random fragments by the cleavage of thf 
template double-stranded polynucleotide. 

A further reference of interest is WO 95/17413 describir. 
a method of gene or DNA shuffling by recombination of specifi 



WO 98/41622 



5 



PCT/DK98/00103 



DNA sequences - so-called design elements (DE) - either by re- 
combination of synthesized double-stranded fragments or re com- 
bination of PGR generated sequences to produce so-called func- 
tional elements (FE) comprising at least two of the design 
5 elements. According to the method described in WO 95/17413 the 
recombination has to be performed among design elements tha~ 
have DNA sequences with sufficient sequence homology to enable 
hybridization of the different sequences, to be recombined. 

WO 95/17413 therefore also entails the disadvantage thai 
10 the diversity generated is relatively limited. Furthermore the 
methods described are time consuming, expensive, and not suited 
for automation. 

De-spi^e--kh-e--exi-s-fcenee-^ -the r e ~is~~s tlxl ~~ 

a need for better iterative in vivo recombination methods for 

15 preparing novel polypeptide variants. Such methods should also 
be capable of being performed in small volumes, and amenable tc 
automation . 

Furthermore, there also is a need for methods providing 
the possibility of being able to shuffle genes with relatively 
20 low homology. 

SUMMARY OF THE INVENTION 

The present invention relates to' a method for the con- 
struction of a library of recombined polynucleotides from a 

25 number of different starting single or double stranded parental 
DNA templates, wherein said starting single or double stranded 
parental DNA templates represent discrete points in a popula- 
tion of genes encoding evolutionary or synthetic homologues of 
a peptide having homologies ranging over 'a broad spectrum frcn 

30 less than 15% to more than SOs, said population exhibiting at 
least one identification sequence, and whereby said genes are 
subjected to a gene shuffling procedure to generate shuffled 
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mu tants of said population of genes representing additional 
discrete points between those of said starting templates. 

The gene shuffling procedure to be used according to the 

invention can be any suitable metnoa -s 

above „ r a procedure as described in our co-pending patent ap- 
plication tiled on the same date, and outlined below. 

according to that procedure template shifts of newly syn- 
thesized DNA strands during in vitro DNA synthesis are utilized 

to achieve DMA shuffling. 

In a further aspect the invention relates to a method o. 
identifying polypeptides exhibiting improved properties in com- 
parison to naturally occurring .polypeptides of the same bioac- 
tivity, whereby a' library of recombined polynucleotides pro- 
duced by the above process are cloned into an appropriate vec- 
tor S a^d vector is then transformed into agitable host sys- 
tem' to be expressed into the corresponding polypeptides, saic 
polypeptides are then screened in a suitable assay, and posi- 
tive results selected. 

Tn a still further aspect the invention relates to a 
ro -, 0 , for producing a polypeptide of interest as identified in 
Z p-ceding orocess, whereby a vector .comprising a poiynu- 
encoding said polypeptide is transformed into a suit- 
able host, said host is grown to. express said polypeptide, anc 
the polypeptide recovered and purified. 

DEFINITIONS 

" . • -v,; c , v ,r.-,rion ^n further detail, tn? 

Prior to discussing tms inve.:L-on _u 

following terms will first be defined. 

„ u i. 5 v.----n nG " herein means recombina 

"Shuffling": The term. s..-----nc 

tio ,- of nucleotide sequence fragment (s) between two or .« 
polynucleotides resulting in output polynucleotides <i.e 



0 
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polynudeotides having been subjected to a shuffling cycie, 
having a nuirfcer of nucleotide fragments exchanged, ir. compar,- 
so n to the input poiynucleotides (i.e. starting point polyn,- 

cleotides) . , tt _ 

"Homology of DNA sequences or polynucleotides In t.,- 
prese nt context the degree of DNA sequence homology is aeter- 
mi ned as the degree of identity between two sequences indicat- 
ing a derivation of the first sequence from the second. The h=- 
m ology may suitably be determined by means of computer program 
k nown in the art, such as GA? provided in thft GCG program pack- 
age (Program Manual for the Wisconsin Package, Version 
August 1994, Genetics Computer Group, 575 Science Drive, Hac:- 

_son, Wisj:ons^j^ 

("197V), Journal of Molecular Biology, 48, 443-453). 

"Homologous": The term "homologous" means that one sm- 
g i. -stranded nucleic acid sequence may hybridize to a comple- 
m : ntary single-stranded nucleic acid sequence. The degree of 
Hybridization may depend on a number of factors including tne 
amount of identity between the sequences and the hybridization 

v, 3C tp-rjo-a-u^ and salt concentration as dis- 
conditions such as te.u?e.a.u.. 

cussed later {vide infra). _ 

Using the coxouter program GAP (vide supra) with tne ,ci- 
, owing settings for DNA sequence comparison: GAP creation pen- 
"aitv o' 5.0 and GAP extension penalty of 0.3, it is in the pre- 
sent context believed that two DNA sequences will be able to 
hybridize (using low stringency hybridization conditions as de- 
fined below) if they mutually exhibit a degree of identity 

- „- ? * leas~ 70%, .ore oreferabiy at least 80%, 2 nd 

prereraoly ol a^ ieas^ /uo, 

even more preferably at least 85%. 

more DNA sequences mutually ex 



q "heterologous 



two O 

* iH p,.i'v w--h is less than above specified, 
hibit a degree of idena.v ~..^n -s» 

_ said to be "heterologous . 

they are in the present con.e.-., 
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"Hybridization:" Suitable experimental conditions for 
determining if two or more DNA sequences of interest d: 
hybridize or not is herein defined as hybridization at lev 
stringency as described in detail below. 

Molecules to which the oligonucleotide probe hybridizes 
under these conditions are detected using a x-ray film or a 
phosphoimager . 

"primer": The term "primer" used herein especially i, 
connection with a" ?CR reaction is an oligonucleotice 
(especially a "PCR-primer" ) defined and constructed according 
to general standard specification known in the art TPCR --- 
practical approach" IRL Press, .(19915)- 

"A primer directed to a sequence:" The term "a primer di- 
rected to a sequence" means that the primer (preferably to be 
used in a PGR reaction) is constructed to exhibit at least 8u-, 

- - -hp seauence part of interest, 
degree of sequence iaentit\ _ -ne seau^ic y 

vo„ a - Ipps- 9^ c-aree of sequence identity to the 
more prererably ai leas, 

qA iH nrimer conseauently is 
seauence part of interest, ...ich sa.d pr.m 

Mnc'mod in order to soecifical.y 
-directed to". The pnner is aes.gn.d in 

_ - re-o^ature it is directed tc- 

anneal at the region at ^ c.ve. .e...p-~^ 

■ - . a - -ho 3' pnei of the primer is es- 
wards. Esoecialiy icenti.y a. .he , 

i -n- th* ^cp- of the oolymerase, i.e. the ability zz 
sentiai ior tne _u..^~~.. 

a oolvmerase to extend the annealed primer. 

"banking" The term "flanking" used herein in connects 
with DMA seouences comprised in a PCR-fragment means the outer- 
most partial sequences of the PCR-fragment, both in the 5' ar.c 

3' ends of the PCR fragment. 

+ -a*» jniv-a-s axino acids sometimes referre 
"Polypeptide t'oiy.ue-s ^ d.u->.^ 

„ Ua co „ . n f a-rino acids determines th 

to as protein, ihe se^;..- o_ a.i..u 

f v,.. -nivnooticie assumes, and this i 

folded conformation the. r o-yp~?t-^~ 

turn determines biological properties such as activity. So. 
polypeptides consist of a single polypeptide cna: 
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(m onomeric) , whilst other comprise several associated polypep- 
tides (multimeric) . All enzymes and antibodies are polypep- 
tides . 

"Enzyme" A protein capable of catalysing chemical reac- 
tions. Specific types of enzymes are a) hydrolases including 
amylases, cellulases and other carbohydrases, proteases, and 
lipases, b) oxidoreductases, c) Ligases, d) Lyases, e; 
isomerases, f) Transferases, etc. Of specific interest in re- 
lation to the present invention are enzymes used in deter- 
gents, such as proteases, lipases, cellulases, amylases, etc. 



p 

DO 



DETAILED DESCRIPTION OF THE INVENTION 

All possible genes encoding a polypeptide of the same 
evolutionary origin can be seen as a very large population c: 
DNA sequences (e.g. (G sp I the set of genes encoding a serine 
rotease}). It has been found that the homology between the 
iyoeotides encoded by single members of such a population may 
be even as small as less than 15% (the genes originating frc= 

"c^Suant" organisms) . 

When searching for polypeptides suited for the various 
ourooses that mankind has developed, it has been found diffi- 
cuI ' t( if not impossible at our present level of knowledge, to 
conclude in a rational manner on the optimal configuration of 
theTolypeptide in question. Therefore it was found desirable 
to orovide a simple method of generating a sub-population c: 
th/ above mentioned very large population, but representing a 
.ubstantial part" of the variation possible within the large 



su 

ooouiation 



o-~ the oreser." invention is thus to provide a 



re 



The ooje; 

, . w , ^ ^ n ^_-s-'.=. :o shuffle components of gene 
method wnereby u is possi---- 
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encoding polypeptides of the same functionality, but having 
only low homologies. 

To this end it is necessary to obtain a reasonable knowl- 
edge of the population in question, meaning haying at disposal 
5 a number of individual members ( e.g. 5/ 10, 15 or more mem- 
bers) representing as high a variation as possible. This small 
sub-population is then used as a starting point for generating 
a much larger sub-population of genes. The corresponding 
polypeptides of the large sub-population obtained are then dis- 

10 played and screened in an appropriate manner to identify such 
members of the large sub-population that are optimal for the 
intended purpose. 

It was found that the expansion of the starting sub- 
population to the large sub-population could be accomplished 

15 using gene shuffling methods. 

Such methods as described in the literature provide means 
to exchange DNA fragments between genes coding for polypeptides 
of a reasonably high homology, typically to be above 80%, re- 
sulting in the generation of novel genes encoding polypeptides 

20 having homologies between 80% and 99%. 

It was also found that in the method of the invention it 
was necessary as starting population to use genes encoding 
polypeptides that are at least from 70% to 80% homologous to at 
least one other gene in the starting population. 

25 According to the invention it is thus important to start 

from a population or sub-set of genes which comprises interme- 
diate sequences ranging from cenes being rather similar to 
genes being rather dissimilar, but still having the same evolu- 
tionary origin (function). Only then a shuffling of even rather 

30 heterologous sequences is feasible. The stepwise shuffling of, 
first, quite homologous genes creates new species which are not 
contained in the starting population, and which, in* the subse- 
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quent shuffling rounds, will recombine with each other and wi 
other more heterologous genes from the starting population, a 
so on . 

Finally, hybrids are generated in which sequence par 
from very heterologous starting genes can be found. These r 
trieved starting genes would have never been shuffled witha 
having the intermediate species in the starting population b 
cause of a too large "sequence space" distance. 

Having this condition fulfilled it was found that it w 
possible to generate genes encoding novel functional polype, 
tides having a homology as low as' the minimum degree of homo 
ocy represented in the starting population. In principle t; 
homology ran_ge_in the_fjjnaij?oj)umi^ -tbs 
that for the starting population. 

The present invention relates in its first aspect to 
method for the construction o: a library of recombined polyn- 
cleotides from a number of different starting single or doub 
stranded parental DNA templates, wherein said starting sing 
or double stranded parental DNA templates represent discre 
coints in a population of genes encoding evolutionary or sy 
thetic homologues of a peptide having homologies ranging over 
broad spectrum from less than 15% to more than 80%, said pop 
iation exhibiting at least one identification sequence, a 
whereby said genes are subjected to a gene shuffling procedu 
to generate shuffled mutants of said population of genes repr 
senting additional discrete points between those of said star 
ing templates. 

According to the invention it is possible to use parent 
DNA templates representing homologies ranging from less f-- 
45%, 40%, 35%, 30%, 25%, 20%, or 15% to more than 80%, 8 
90%', 95%, or 99%. 
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In specific embodiments at least one identification se 
quence is identified and primers constructed to anneal thereto 
These sequences can be located anywhere on the genes. 

In a preferred embodiment at leasts two identif icatio; 
5 sequences are identified. These sequences can be located at anv 
distance from each other, but it is preferred that they are lo- 
cated as far as possible from each other on the genes. 

According to these embodiments said identification se- 
quences may correspond to an amino acid sequence of from 4 to £ 
10 amino acid residues, which sequence is highly conserved amon: 
the peptides encoded by the collection of starting single or 
double stranded parental DNA templates, preferably from 5 to 7 
amino acid .residues. 

It is preferred that the identification sequences are lo- 
15 cated a distance apart corresponding to the average size of the 
genes in said collection with a variation of up to 40%. The 
longer apart the sequences are the larger a part of the gene is 
shuffled. 

However, situations may arise, where it is desired only 
20 to shuffle the sequences between identification sequences lo- 
cated quite close to each other. 

As indicated above the gene shuffling method used in the 
method of the invention is of less or no significance. In prin- 
ciple any method will work. 
25 Thus the methods disclosed in WO 95/22625 and WO 95/17413 

are fully operable in the present invention. Details showing 
how these methods may be used for practising the present inven- 
tion are indicated in the Examples below. 

Therefore further gene shuffling methods described in co- 
30 filed patent applications are also contemplated for use in the 
present method. 
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According to one of these procedures template shifts cf 
newly synthesized DNA strands during in vitro DNA synthesis are 
utilized to achieve DNA shuffling. 

More specifically that method provides for the construc- 
tion of a library of recombined homologous polynucleotides from 
a number of different starting single or double stranded paren- 
tal DNA templates and primers by induced template shifts durir.g 
an in vitro polynucleotide synthesis using a polymerase, 
whereby 

A. extended primers or polynucleotides are synthesized by 

a) denaturing parental double stranded DMA templates to 
produce single stranded templates, 
b) -annealing, .said-pr-imer-s-to. -the -sing-le_s-tranded_DN.A -tem- 
plates, 

c) extending said primers by initiating synthesis by use 
of said polymerase, 

d) cause arrest of the synthesis, and 

e) denaturing the double strand to separate the extended 
primers from the templates, 

B. a template shift is induced by 

a) isolating the nevly synthesized single stranded ex- 
tended primers frcm the templates and repeating steps 
A.b) to A.e) using said extended primers produced in 
(A) as both primers and templates, or 

b) repeating steps A.b) to A.e), 

C. the above process is terminated after an appropriate num- 
ber of cycles of process steps A. and B.a), A. and B.bi 
or combinations thereof, and 

D. ootionally the produce, p : -.-nucleotides are amplified in 
standard PGR reaction vith specific primers to seiectivel 
amplify polynucleotides c: interest. 
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In a further specific embodiment the gene shuffling is 
performed by the method described in our co-filed application, 
whereby conserved regions of heterologous DNA sequences are 
5 identified for shuffling of heterologous DNA sequences of in- 
terest having at least one conserved region comprising the 
following steps: 

i) One or more conserved region (s) (designated "A, B, C" 
10 etc..) in two or more of the heterologous sequences are iden- 
tified. 

ii) Two sets of PCR primers .(each set comprising a sense and 
an anti-sense primer) for one or more conserved region (s) 

15 identified in (i) are constructed. 

In these primers, one set (named: "a"=sense primer; 
M a' "^anti-sense primer) is directed to a sequence region 5' 
(sense strain) of the conserved region (e . g. ' conserved region 
"A"), and the second set (named: "b"=sense primer; "b' "=anti- 

20 sense primer) is directed to a sequence region 3' (sense 
strain) of the conserved region (e.g. conserved region NX A"), 
and the antisense primer "a''' and the sense primer VN b" have a 
homologous sequence overlap of at least 10 base pairs (bp) 
within the conserved region. 

25 

iii) for one or more identified conserved region of interest 
in step (i) two PCR amplification reactions are performed us- 
ing the heterologous DNA sequences from step (i) as templates, 
whereby one of the PCR reactions is using the 5' primer set 

30 identified in step (ii) (e.g. named "a", "a"') and the second 
PCR reaction is using the 3' primer set identified in steo 
(ii) (e.g. named "b", "b' ") . 
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ed as described in step (iii) 
tied conserved region in step 
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v) Two or more PCR fragments isolated from step (iv) and 
performance of a Sequence overlap extension PCR reaction (SOI- 
PCR) using the isolated PCR fragments as templates are pooled. 

5 ■ 

vi) The PCR fragment(s) obtained in step (v) are isolated, 
whereby the isolated PCR fragment comprise numerous different 
shuffled sequences containing a shuffled mixture cf the PCR 
fragments isolated in step (iv) . 

10 

In specific embodiments various modifications can be mace 
in the process of the invention.* For example it is advantageous 
to apply a defective polymerase either an error-prone po- 
lymerase to introduce nutations in comparison to the templates, 
15 or a polymerase that will discontinue the polynucleotide syn- 
thesis prematurely to effect the arrest of the reaction. 

According to a specific embodiment the peptide is a pro- 
tease, especially a subtiiase. 

In the case of a subtiiase identification sequences may 
20 be located around the aspartic acid in position 32, or the his- 
tidine in position 64 and the active serine in position 221 cf 

subtilisin 3?M' . 

In a further embodiment the peptide is an amylase, espe- 
cially an a-amylase. 
25 in that case cf identification sequences may be located 

around the Asp in position 100 and the Asp in position 328 of 
3. licheniforr.is a-amylase. 

For a-amylases from Bacillus species the identification 
sequences may preferentially :e located around Tyr in position 
30 6 and around Ser in position 4"€. 

t n further embodiments the peptide is a lipase, or a cel- 

lulase. 
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•In respect of lipases, suitable identification sequences 
may be found by using the lipase alignment shown in A. Svendsen 
et al. (1995): Biochemical properties of cloned lipases frcm 
the Pseudomonas family, Biochimica et Biophysica Acta 1259 5- 
5 17. Examples could be around the Pro in position 10 or arour.d 
the His in position 285 (using P. glumae lipase numbering) . 

In respect of cellulases, in particular cellulases frcm 
family 45 cellulases (see WO 96/29397), suitable identification 
sequences may be the conserved region ""Thr Arg Tyr Trp Asp Cys 
10 Cys Lys Pro/Thr" and the conserved region " xx Trp Arg Phe/Tvr 
Asp Trp Phe". For further details relating to those cellulase 
identification sequences reference is made to (PCI 
DK97/00216) . See in particular in example 3 of (FCT 
DK97/00216) . 

15 in respect of xyianases, in particular xylanases frcrr. 

family 11 xylanases, suitable identification sequences may be 
the conserved regions "DGGTYDI:" and "EGYQSSG" . For further de- 
tails relating to those xyianase identification sequences ref- 
erence is made to (?CT DK97/00216). See in particular in exai- 

20 pie 1,2 of (PCT DK977002 1 6 ) . 

PCR-primers : 

The ?CR primers are constructed according to the standard 
descriptions in the art. Normally they are 10-75 base-pairs 
25 (bp) long. However, for the specific embodiment using random cr 
semi-random primers the length may be substantially longer as 
indicated above. 

PCR-reactions : 

30 if not otherwise mentioned the ?CR-reaction performed ac- 

cording to the invention are performed according to standard 
protocols known in the art. 
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The term "Isolation of PGR fragment" is intended to cover 
as broad as simply an aliquot containing the PCR fragment. How- 
ever preferably the PCR fragment is isolated to an extend which 
remove surplus of primers, nucleotides templates etc.. 
5 In an embodiment of the invention the DNA fragment (s) 

is (are) prepared under conditions resulting in a low, medium or 
high random mutagenesis frequency. 

To obtain low mutagenesis frequency the DNA sequence (s) 
(comprising the DNA fragment (s) ) may be prepared by a standard 
10 PCR amplification method (US 4,683,202 or Saiki et al., (1985), 
Science 239, 487 - 491) . 

A medium or high mutagenesis frequency may be obtained by 
perTbrming - t'h'e" "PCR" "amplif ication- under- -eond-i-ti-o-n-s— wh-ieh — in- 
crease the misincorporation of nucleotides, for instance as ce- 
15 scribed by Deshler, (1992), GAT A 9(4), 103-106; Leung et al., 
(1989), Technique, Vol. 1, No. I, 11-15. 

It is also contemplated according to the invention to 
combine the PCR amplification (i.e. according to this embodi- 
ment also DNA fragment nutation) with a mutagenesis step using 
20 a suitable physical or chemical mutacenizing agent, e.g., one 
which induces transitions, t ransversions , inversions, scrambl- 
ing, deletions, and/or insertions. 

Expressing the recombinant protein from the recombinant shuf- 
, 2 5 fled sequences 

Expression of the recombinant protein encoded by the 
shuffled sequence in step vi) of the second and third aspect of 
the present invention may be performed by use of standard ex- 
pression vectors and corresponding expression systems known in 
30 the art. 
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Screening and selection 

In the context of the present invention the term 
"positive polypeptide variants" means resulting polypeptide 
variants possessing functional properties which 'has been in- 
5 proved in comparison to the polypeptides producible from the 
corresponding input DMA sequences. Examples, of such improved 
properties can be as different as e.g. enhance or lowered bio- 
logical activity, increased wash performance, thermostability, 
oxidation stability, substrate specificity, antibiotic resis- 
10 tance etc. 

Consequently, the screening method to be used for identi- 
fying positive variants depend on which property of the 
polypeptide in question it is desired to change, and in what 
direction the change is desired. 
15 a number of suitable screening or selection systems to 

screen or select for a desired biological activity are de- 
scribed in the art. Examples are: 

Strauberg et al. (Biotechnology 13: 669-673 (1995), de- 
scribes a screening system for subtilisin variants having a 
20 Calcium- independent stability; 

Bryan et al. (Proteins 1:326-334 (1986)) describes a 
screening assay for protease having enhanced thermal stability; 
and 

PCT-DK96/00322 describes a screening assay for lipases 
25 having an improved wash performance in washing detergents. 

An embodiment of the invention comprise screening or se- 
lection of recombinant protein (s) , wherein the desired biologi- 
cal activity is performance in dish-wash or laundry detergents. 
Examples of suitable dish-wash or laundry detergents are dis- 
30 closed in PCT-DK96/00322 ar.: v;j 95/30011. 
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If the improved functional property of the polypeptide is 
not sufficiently good after one cycle of shuffling, the 
polypeptide may be subjected to another cycle. 

In an embodiment of the invention wherein polynucleotides 
5 representing a number of mutations of the same gene is used as 
templates at least one shuffling cycle is a backcrossing cycle 
with the initially used DNA fragment, which may be the wild- 
type DNA fragment. This eliminates non-essential mutations. 
Non-essential mutations may also be eliminated by using wilc- 

10 type DNA fragments as the initially used input DNA material. 

Also contemplated to be within the invention is polypep- 
tides having biological activity such as insulin, ACTH, gluca- 
gon, s oma tostatin, s omaYo t r dp in"; t hymo~sl n - , - pax a-t-h y r o-i-d- ho-rmon e-, - 
pituary hormones, somatomedin, erythropoietin, luteinizing hor- 

15 • mone, chorionic gonadotropin, hypothalamic releasing factors, 
antidiuretic hormones, thyroid stimulating hormone, relaxin, 
interferon, thrombopoeit in (TPO) and prolactin. 

It is also contemplated according to the invention to 
shuffle parental polynucleotides • as indicated above originating 

20 from wild type organisms of different genera. 

The starting parental DNA sequences may be any DNA se- 
cuences including wild-type DNA sequences, DNA sequences encod- 
ing variants cr mutants, or modifications thereof, such as ex- 
tended or elongated DNA sequences, and may also be the outcome 

25 of DNA sequences having been subjected to one or more cycles of. 
shuffling (i.e. output DNA sequences) according to the method 
of the invention or any other method (e.g. any of the methods 
described in the prior art section) , or synthetic sequences cr 
otherwise mutagenized sequences . 

30 When using the method of the invention the resulting re- 

combined polynucleotides (i.e. shuffled DNA sequences), have 
had a number of nucleotide fragments exchanged. This results in 
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replacement of at least one amino acid within the polypeptide 
variant, if comparing it with the parent polypeptide. It is to 
be understood that also silent exchanges are contemplated (i.e. 
nucleotide exchange which does not result in changes in the 
amino acid sequence) . 



MATERIALS AND METHODS 

EXAMPLES 

EXAMPLE 1 

S h ufflinq • of a pool/pooulat ion of evolutionary homolocues 
originating from bacterial hosts. 

In this Example a gene shuffling method similar to the 
one described in WO 95/22625 is used: 

A population of subtilase-encoding genes or parts of such 
genes are generated through isolation or by synthesis. Sources 
for the genes may be as described in Siezen et al. Protein En- 
gineering 4 1991 719-737. The population may also comprise 
cenes encoding the pre-pro subtilases as 'defined in Gen3ank en- 
tries A13050JL, D26542, A22550, Swiss-Prot entry SUBT_BACA>1 
P00732, and PD493 (Patent Application No. WO 96/34963) with 
homologies (similarities) ranging from 32% to 64% as calculated 
by the MegAlign software from DNASTAR Inc. (WI 53715, USA) us- 
ing the Clustal Method. 

The substrates used in the shuffling reaction are repre- 
sented by linear double stranded DNA generated by PCR amplifi- 
cation using primers located at/directed towards the ends c 
the DMA to be shuffled. In this instance the primers can ccn 
veniently be constructed using the sequences surrounding th 
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histidine in pos 64 of subtilisin BPN' and the serine in posi- 
tion 221 of subtilisin BPN-' . The template for this PCR can ei- 
ther be plasmids containing cloned protease genes or chromoso- 
mal DNA extracted from bacterial strains e.g. protease secre:- 
5 ing bacteria isolated from soil. The substrate will typically 
be generated separately for all the templates and pooled before 
the shuffling reaction. 

The substrates are fragmented e.g. by DNAse I treatment 
or shearing by sonication as described in WO 95/22625. The gen- 

10 erated fragments are separated according to size by agarose eel 
electrophoresis and generated -fragment of the desired size, 
e.g. from 10 to 50 bp. or from 30 to 100 bp, or from 50 to 150 

bp, or from 100 to 200 "bp" "are" plJriTi'ed~f riDm--tne-geiT--- — 

These fragments are reassembled by PCR as described in 

15 W095/22625. Optionally, correctly assembled DNA fragments are 
amplified by subjecting the product from the assembly reaction 
to another PCR including two primers able to anneal to the ends 
of correctly assembled fragments. The resulting fragments can 
be cloned into suitable expression plasmids, and subsequently 

20 screened for a specific property, such as thermostability using 
assays well known in the art. 
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1. A method for the construction of a library of recombined 
polynucleotides from a number of different starting single cr 

5 double stranded parental DNA templates, wherein said starting 
single or double stranded parental DNA templates represent dis- 
crete points in a population of genes encoding evolutionary cr 
synthetic homologues of a peptide having homologies ranging 
over a broad spectrum from less than 15% to more than 80%, said 
10 population exhibiting at least one identification sequence, and 
whereby said genes are subjected to a gene shuffling procedure 
to generate shuffled mutants of said population of genes repre- 
senting additional discrete points between those of said starr- 
ing templates. 

15 

2. The method of claim 1, wherein said homologies range from 
less than 45%, 40%, 35%, 30%, 25%, 20%, or 15% to more than 
80%, 85%, 90%, 95%, or 99%. 

20 3. The method of claim 1 or 2, wherein said starting popula- 
tion exhibits at least two identification sequences. 

4. The method of any of the claims 1 to 3, wherein said 
identification sequences corresponds to amino acid sequences cf 
25 from 4 to 8 amino acid residues, which sequence is highly con- 
served among the peptides encoded by said collection of start- 
ing single or double stranded parental DNA templates, prefera- 
bly from 5 to 7 amino acid residues. 

30 5. The method of claim 3 cr 4, wherein said identification 
sequences are located a distance apart corresponding to the av- 
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erage size of the genes in said collection with a variation cz 
up to 40%. 

6. The' method of claim 3, wherein said variation is 20* , 
5 15%, 10%, or 5%. 

7. A method of identifying a polypeptide of interest exhib- 
iting improved properties in comparison to naturally occurring 
or other known polypeptides of the same activity, whereby a 

10 population of recombined polynucleotides produced by a process 
according to any of the claims 1* to 6 are cloned into an apprc- 

pr-i.a-t.e_ vectpr,_ said vector is . transformed into a suitable hosz 

system, to be expressed into the cor responding— polypept ides ,- ^ 
and said polypeptides are screened in a suitable assay, ar.d 
15 positive polypeptides selected. 

8. A method for producing a polypeptide of interest as iden- 
tified according to claim 7, whereby a vector comprising a 
polynucleotide encoding said identified polypeptide is trar.s- 

20 formed into a suitable host, said host is crown to express said 
polypeptide, and the polypeptide recovered and purified. 

9. The method of claim 3, wherein said peptide is a prote- 
ase, especially a subtilase. 

10. The method of claim 9, wherein said identification se- 
quences are located around the histidine in position 64 and the 
active serine in position 221 of subtilisin BPN ' . 

30 11. The method of claim E , wherein said peptide is an amy- 
lase, especially an a-amyl; 
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12. The method of claim" 11, wherein said identification se- 
quences are located around the Asp in position 100 and the Asp 
in position 328 of B. licheniformis a-amylase. 

5 

12. The method of claim 11, wherein said identification se- 
quences are located around the Tyr in position 8 and around Ser 
in position 476 of B . licheniformis a-amylase. 

10 13. The method of claim 8, wherein said peptide is a lipase. 

14. The method ^of claim 13/ wherein said identification se- 
quences are located around the Pro in position 10 and arour.c 
the His in position 285 of P. gl u~ae lipase. 

15 

15. The method of claim 8, wherein said peptide is a cellu- 
lase . 

15. The method of claim 9, wherein said peptide , is a xyia- 
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